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Abstract 


Much  of  modern  stochastic  control  theory  uses  ideal  white  noise  driven 

models  (Ito  equations).  If  the  observed  data  is  corrupted  by  noise,  then  the 

noise  is  usually  assumed  to  be  ’white  Gaussian’.  Typically,  if  the  underlying 
models  are  linear,  one  uses  a  Kalman-Bucy  filter  to  get  an  estimate  of  the 

state,  and  then  bases  the  control  on  this  estimate.  In  practice,  the  noises  are 
rarely  ’white’,  and  the  reference  signals  and  the  systems  arc  only 
approximations  in  some  sense  to  a  diffusion.  Never-the-less,  owing  to  lack  of 
viable  alternatives,  one  still  uses  the  Kalman-Bucy  filter,  etc.  Then  the 
estimates  are  not  optimal  and,  indeed,  might  be  quite  far  from  being  optimal. 
Similarly  for  the  corresponding  control.  (Examples  arc  given  to  illustrate  this.) 
The  sense  in  which  the  estimates  and/or  control  is  useful  need  to  be  examined 
in  order  to  justify  the  use  of  the  commonly  used  procedure.  The  issue  is 
much  deeper  than  mere  ’robustness*  in  the  usual  sense,  since  basic  questions  of 
interpretation  of  the  results  are  involved.  The  paper  deals  with  these 
questions.  For  the  filtering  problem  where  the  signal  is  a  ’near’ 

Gauss-Markov  process  and  the  observation  noise  wide  band,  it  is  shown  that 
the  usual  method  is  ’nearly  optimal’  with  respect  to  a  class  of  alternative  data 
processors.  This  alternative  class  is  rather  natural  and  includes  the  data 
processors  which  one  would  normally  want  to  use.  It  is  unlikely  that  the  class 
can  be  enlarged  very  much  in  general.  The  asymptotic  (in  time  and  bandwidth) 
problem  is  treated,  as  is  the  (much  harder)  conditional  Gaussian  case,  and  a 
case  where  the  observations  are  non-linear.  The  basic  techniques  are  those  of 
weak  convergence  theory.  Similar  results  are  obtained  for  the  combined 


Introduction 


In  much  of  modern  control  and  filtering  theory,  one  uses  ideal  white 
noise  driven  models  of  the  following  type,  where  W„(  ),  W,(  )  and  Wf-) 

y  t  X 

are  standard  Wiener  processes,  u()  is  a  control,  and  b^,  o^,  etc.,  are 
appropriate  functions.  We  let  z()  denote  a  reference  signal,  x()  the 
control  system,  y(  )  the  (integral  of  the)  noise  corrupted  observation  and 
rj(u)  and  7(u)  the  cost  function. 


(1.1) 

dz 

=  b^(z)dt  +  o^(z)dW'^ 

(1.2) 

dx 

=  b^(x,u)dt  +  03,(x)dW^ 

(1.3) 

dy 

=  h(x,z)dt  +  dWy 

(1.4) 

r.j,(u) 

fT 

=  E  k(x(s),  2(s),  u(s))ds 

*’o 

(1.5) 

r(u) 

=  lim  rT,(u)/T  . 

'J'^00  * 

Of  course,  the  actual  physical  system,  which  we  denote  by  z^(  ),  x^(  ), 
y^(  )  (reference  signal,  control  system,  integral  of  the  physical  observation 
noise)  is  not  of  the  form  (1.1)  -  (1.3).  The  reference  signal  z*(  )  might  be  a 
’near  diffusion’  -  only  approximately  representable  by  (1.1),  and  the  noise  in 
the  control  and  observation  system  would  rarely  be  ’white’.  But,  typically,  one 
somehow  decides  upon  a  suitable  model  (1.1)  -  (1.3),  attempts  to  determine  a 
good  or  ’nearly  optimal’  control  for  that  model,  and  then  applies  this  control 
to  the  actual  physical  system.  In  such  a  context,  one  must  naturally  question 


the  value  of  the  determined  control  for  the  ’physical’  problem,  as  well  as  the 


value  of  the  output  of  the  filter  (even  for  any  ’nice’  fixed  control)  for 
making  estimates  of  functionals  of  the  physical  process  z^()  which  is 
approximated  by  z(  ). 

The  filter  output  will  rarely  be  even  nearly  optimal  for  use  in  making 
such  estimates,  and  the  control  (based  on  the  filter  outputs)  will  rarely  be 

’nearly  optimal’.  Very  little  attention  has  been  devoted  to  such  problems  -  yet 
they  are  at  the  core  of  the  problem  of  rclevcncc  of  much  theoretical  work.  An 
important  theory  of  robustness  has  been  developed  [9],  [10]  -  in  which  one 
tries  to  construct  a  filter  in  which  the  output  is  a  continuous  function  of  the 
input.  The  idea  is  that  the  model  would  be  (1.1),  (1.3),  but  with  Wy(  ) 
replaced  by  something  else.  Such  robustness  is  very  useful.  But  the  very 
raising  of  such  a  robustness  issue  implies  that  the  noises  might  not  be  white. 
If  that  is  the  case,  what  is  the  value  of  the  filter  (robust  or  not)  -  or  of 

controls  based  on  the  filter  output.  Unless  one  is  willing  to  assume  more, 

there  is  no  statistical  interpertation  of  the  output  of  such  a  filter. 
Furthermore,  robustness  must  deal  with  the  full  control/f  iltcring  problem, 
correlation  between  the  systems,  the  asymptotic  (average  cost  per  unit  time 
problem),  z(  ),  x(  )  replaced  by  ’near  diffusions’,  etc.  We  will  deal  with  all 
these  questions  here,  when  the  approximating  system  (1.1),  (1.2)  is  linear  -  for 
which  a  fairly  complete  theory  can  be  obtained. 

Owing  to  the  usual  lack  of  ’near  optimality’  (for  the  physical  system)  of 
the  filter  and  control  which  is  obtained  by  using  (1.1)  -  (1.3),  one  can  only 

ask  the  question:  with  respect  to  which  alternative  filters  (’data  processors’)  or 
controls  for  the  physical  system  are  the  chosen  ones  nearly  optimal?  It  turns 


. 
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out,  under  quite  broad  condition,  that  this  class  of  alternative  filters  and 
controls  is  quite  large  and  very  reasonable.  Such  results  are  essential,  if  the 
use  of  the  ideal  models  (1.1)  -  (1.3)  is  to  make  sense  in  a  large  part  of  the 
applications. 

The  basic  mathematical  techniques  used  here  are  those  of  the  theory  of 
weak  convergence  of  probability  measures  [1],  [3],  [4],  a  group  of  methods 
which  are  quite  powerful  for  dealing  with  many  difficult  approximation 
problems  in  control  and  communication  theory  (and  elsewhere)  [1],  [5]  -  [8], 
[14],  and  [15].  The  basic  questions  of  approximation  here  are  closely  related 
to  those  of  the  convergence  of  the  sequence  of  physical  processes 
(z*(  ),  x^(  ),  y^(-))  to  the  ideal  model  (1.1)  -  (1.3),  as  the  ’bandwidth’  of  the 
driving  noises  (say,  l/€^)  goes  to  infinity. 

We  begin  with  a  discussion  of  the  pure  filtering  problem.  Here  -  for  the 
case  where  the  ideal  model  is  linear  -  one  would  simply  use  the  Kalman-Bucy 
filter  for  the  ideal  model  -  but  whose  input  is  the  physical  observation. 

Obviously,  the  filter  docs  not  usually  yield  the  conditional  distribution  of  the 
z^(t)  given  the  data  y^(s),  s  «  t.  In  Section  2,  we  discuss  some  counter 

examples  to  illustrate  the  sort  of  difficulties  which  arise  in  such 
approximations,  and  in  Section  3  the  approximation  theorem  is  given,  together 
with  the  class  of  alternative  data  processors.  Section  4  concerns  the  average 
filter  error  per  unit  time  -  or  the  errors  for  large  time.  We  show  that  the 
filter  output  can  be  used  to  obtain  estimates  of  a  wide  class  of  functionals  of 
z*(  ),  which  are  good  with  respect  to  a  very  broad  and  natural  class  of 

alternative  estimators.  The  examples  in  Section  2  illustrate  why  they  would 

not  be  ’nearly  optimal’  in  general.  In  Sections  5  and  6,  we  treat  the 
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conditional  Gaussian  case,  and  a  case  where  the  observation  is  non-linear,  and 
in  Section  7,  the  non-linear  observation  case  for  large  time.  The  power  of  the 
weak  convergence  approach  should  be  amply  evident  in  these  sections.  The 
conditional  Gaussian  case  must  be  treated  with  some  care,  owing  to  the 
interaction  between  the  wide  bandwidth  noise  and  the  ’conditional  Gaussian’ 
coefficients.  It  is  particularly  important  that  any  robustness  or  approximation 
theory  be  able  to  treat  the  large  time  -  large  bandwidth  problem,  and  the 
conditional  Guassian  case  and,  at  the  moment,  there  seems  to  be  no  alternative 
to  the  weak  convergence  point  of  view  for  this. 

The  combined  filtering  and  control  problem  is  dealt  with  in  Section  9. 
The  optimal  control  for  (1.1)  -  (1.3)  will  be  nearly  optimal  for  the  physical 
system  -  in  comparison  with  a  large  class  of  alternative  controls.  Appendix  1 
contains  some  definitions  concerning  weak  convergence.  We  will  use  the  arrow 
^  to  denote  weak  convergence.  We  have  tried  to  formulate  the  models  and 
results  so  that  the  paper  is  not  burdened  with  a  large  amount  of  weak 
convergence  theory  or  calculations  -  and  so  that  available  references  can  be 
used  where  possible.  There  are  extensions  in  many  directions;  discrete 
parameter  problems,  impulsive  control,  etc.,  all  treated  very  similarly  to  the 


treatment  here. 


2.  ’Nearly’  Optimal  Linear  Filtering:  Formulation  and  Preliminaries 


In  the  next  few  sections,  we  consider  the  following  filtering  problem: 
For  each  c  >  0,  z^(  )  is  a  si};nal  process,  ■)  is  a  ’w idc-bandwidth' 

observation  noise,  the  two  arc  mutually  independent  and  right  continuous  (with 
left  hand  limits)  and  the  actual  observation  process  is  {y^(t),  t  ?  0}: 

(2.J)  y'(i)  =  +  i^(t),  y^(0)  =  0  . 

The  ’dependent’  ease  can  readily  be  handled.  It’s  omitted  in  order  to  simplify 
the  notation.  Define  y^(t)  =  Jq  y®(s)ds  and  \\’y(t)  =  Jq  ^y(s)ds.  Let  z(  ) 

be  a  Gauss-  Markov  process  satisfying 

(2.2)  dz  =  A^zdt  +  B^dW^, 

where  )  is  a  standard  Wiener  process.  The  A^,  arc  constant 

matrices,  although  they  could  be  time-dependent  in  all  parts,  except  those 
where  t 

We  arc  concerned  with  the  ease  where  5y(  )  is  ’ncarh'  white  noise, 
and  z^(  )  is  'nearly'  a  Gauss-Markov  diffusion,  and  hence  suppose  that 

(2.3)  (zU  ).  W«(  ))  ^  (z(  ).  Wy(  ))  as  e  -  0  , 

where  )  is  a  non-degenerate  Wiener  process.  B\  the  weak  convergence 

and  independence  of  z^(  )  and  '''(■)  is  independent  of  W  (  ).  The 

y  E  y 

weak  limit  of  {\^(  ))  is  y(  ): 


dy  =  H.zdt  +  dW„  ,  y(0)  =  0  . 
®  y 


Let  y  £  R*',  Euclidean  k-space,  and  z  £  R™ 

The  actual  physical  system  is  ’fixed’  and  correspond  to  some  small 

£  >  0.  The  use  of  weak  convergence  here  is  just  a  way  of  embedding  the 

actual  data  in  a  sequence  -  so  that  an  approximation  method  can  be  used.  We 
work  with  the  ’near  diffusion’  z^(  )  and  ’wide  bandwidth’  noise  5y(  ).  But 
to  evaluate  the  filter  that  we  design  by  using  the  ideal  model  (2.2),  (2.4),  but 
with  actual  input  y^(  ),  the  weak  convergence  method  is  very  useful.  W.p.l 
convergence  ideas  are  inappropriate  in  our  context  and  would  (in  any  case) 
restrict  our  flexibility.  The  ’distributional’  information  contained  in  the  weak 
convergence  is  all  that  is  needed,  since  the  filters  are  evaluated  by  computing 
expectations  of  prediction  errors.  Similarly,  the  value  of  a  control  is  evaluated 
via  an  expectation  of  a  cost  function  -  so  only  distributional  information  is 

needed. 

We  arc  interested  in  approximating  the  value  of  expectations  of 
functions  of  z^(  ),  conditioned  on  the  data  y^(  ).  This  is  not  easy.  Except 
(and  even  then,  rarely)  for  the  special  stationary  and  Gaussian  cases  of  the 
classical  Wiener  theory,  it  is  nearly  impossible.  Furthermore,  if  robustness  is 
the  issue,  then  we  cannot  restrict  ourselves  to  Gaussian  noise  -  since  it  itself 
is  only  an  approximation  to  the  physical  processes. 


For  (2.2),  (2.4),  the  classical  Kalman-Bucy  filter  equations  arc 


(2.5) 


dz  =  A^zdt  +  Q(t)  [dy  -  H^zdt] 

Q(t)  =  E(t)H; 

(2.6)  I  =  A^E  +  EA,  +  -  Eh'r-^HE  , 

where  Rq  =  covariance  matrix  of  observation  ’noise’  Wy(l),  which  (w.l.o.g., 
by  a  simple  rescaling)  we  set  to  I,  unless  mentioned  otherwise.  In  practice, 

with  physical  wideband  observation  noise  and  the  signal  only  a  ’near’ 

Gauss-Markov  process,  one  normally  uses  (2.6)  and  the  ’natural’  adjustment  of 
(2.5),  namely 

(2.5wb)  z«  =  A^  +  Q(t)  [y«  -  . 

We  want  to  know  in  what  way  the  pair  (2.5^3),  (2.6)  makes  sense. 
Typically,  it  is  not  an  optimal  -  or  even  nearly  optimal  -  filter,  for  the 
physical  observation.  But,  as  will  be  seen,  it  makes  a  great  deal  of  sense  and 
is  quite  appropriate  in  a  specific  but  important  way.  One  cannot  ask  whether 
it  is  ’nearly  optimal’  -  but,  rather,  with  respect  to  what  class  of  alternative 
estimators  is  it  ’nearly  optimal’  when  estimating  specific  functionals  of  z*(  ). 
Weak  convergence  theory  provides  a  natural  tool  for  answering  this  question. 

Some  of  our  results  arc  related  to  these  in  [2],  which  concerns  a  non-linear 

filtering  problem.  But,  for  our  specific  case,  it  is  possible  to  go  further  and 
get  much  more  information  fairly  readily,  and  to  treat  the  asymptotic  (in  time 
as  well  as  in  bandwidth)  problem,  various  non-linear  observation  functions,  the 
conditional  Gaussian  case,  and  the  combined  filtering  and  control  problem; 
hence  the  overlap  with  [2]  is  very  small. 


Before  proceeding,  it  is  useful  to  consider  several  simple  examples  which 
illustrate  the  problems  that  we  must  contend  with,  particularly  concerning  the 
difference  in  the  ’information’  contained  in  the  (integral  of  the)  physical 
observation  process  y^(  )  and  in  the  ideal  (limit)  y(  ),  and  the  possible  lack 
of  continuity  in  the  optimal  estimators  as  the  noise  bandwidth  goes  to  Let 

(X|^,Yjj)  be  bounded  real-valued  random  variables  which  converge  in 
distribution  (or  even  w.p.l)  to  (X,Y).  Generally  E(Xj^|Y^)  E(X|Y).  X^ 

might  be  a  physical  signal  and  Y^^  the  physical  observation,  with  the  pair 
(X^,Y^)  close  in  distribution  to  a  much  simpler  pair  (X,Y). 

Consider  the  example  where  the  lack  of  convergence  is  particularly  evident; 

Example  1.  X^^  =  X,  Y^  =  X/n. 

Example  2  illustrates  a  related  pathology.  If  where  Y 

is  a  random  varible  and  (Z^.Y)  ^  (Z,Y)  (or  even  converges  w.p.l).  Then  Z 
is  not  generally  a  function  of  Y. 


Example 

2.  Let 

Y 

be 

uniformly  distributed  on 

[0,1]. 

Define 

> 

c 

II 

c 

N 

for  0  S 

Y  <  l/n 

and. 

in  general. 

define 

Zn 

=  (nY  - 

k)  on 

k/n  S  Y 

<  (k+l)/n. 

k 

=  0,1 

,...,n-l.  Z„  is  a 

’sawtooth’  function  of 

Y.  Also 

(Zn,Y)  => 

(Z,Y)  where 

Z 

is  independent  of 

Y,  and 

both 

Z  and 

Y  are 

uniformly  distributed  on  [0,1].  Clearly  E(Zj^jY)  -/-*  E(Z|Y)  in  any  sense. 


Example  2  arises  when  wc  have  a  sequence  of  estimators,  say 
Zg(y^()),  using  the  data  y^().  Even  if  the  pair  (Z£(y^()),  y^()) 


converges  to  a  pair  (Z,y(-)),  the  limit  Z  might  not  be  a  function  of  y(  ). 
Here  the  limit  is,  in  fact,  independent  of  the  data  y(  ).  Similarly,  for  a 
control  problem  using  noise  corrupted  data,  say  y^(  ).  The  limit  control 
might  be  independent  of  the  limit  data! 

Even  though  Wy(-)  ^  W^f  ),  a  non-degenerate  Wiener  process,  y^(  ) 
might  contain  a  great  deal  more  information  about  z^(  )  than  y(  )  does 
about  z(  ).  For  an  extreme  case,  consider 

Example  3.  Let  t*,  i  >  0,  be  a  strictly  increasing  sequence  of  real  numbers 


for  each 

,  1 

€,  such  that  tj  -• 

®  and 

sup  1 

i  IM-I-I 

- 

tj^  0.  Define 

l  ‘2i+l 

tjj,  and  for  any  t  > 

0,  let  1 

0. 

Define  a  ’new’ 

t.Ut 

observation 

noise  5y(  )  by  resetting 

t^(t)  =  0 

for  t 

€ 

The  integral  of  the  new  ^y(-)  still  converges  weakly  to  the  Wiener  process 
Wy(  ).  But  H^z^(  )  is  known  nearly  exactly  for  small  €.  There  are  even 
forms  of  this  example  for  which  the  new  )  is  stationary. 

These  examples  arc  admittedly  pathological.  But  we  arc  working  with 
vague  concepts  such  as  ’wide  bandwidth’  observation  noise,  ’near’  Gauss-Markov 
processes,  and  with  integrals  of  the  observation  (as  one  always  does  in  modern 
filtering  theory).  The  examples  do  caution  us  to  take  considerable  care.  The 
examples  showed  that  we  might  lose  information  in  going  to  the  limit.  The 
following  lemma  (whose  truth  was  first  told  to  the  author  by  T.  Kurtz)  shows 
the  sense  in  which  we  never  gain  information  on  going  to  the  limit  -  (i.c. 


noise  bandwidth 


Definition.  For  a  set  G,  dG  =  (closure  of  G)  minus  (interior  of  G)  = 
boundary  of  G.  For  a  random  variable  Y,  let  B(Y)  be  the  minimal 
o-algebra  measuring  Y,  and  let  denote  the  indicator  function  of  the 

set  {w:Y  e  G}. 

Lemma  2.1.  Lei  (X^,Y^)  =»■  (X,Y)  (X^-real  valued,  Y^  with  values  in  R®). 
Then 

(2.7)  E[X„  -  E(X  IY„)]2  «  E[X  -  E(X|Y)]2  . 

Remark.  In  Examples  1  to  3,  the  inequality  is  strict. 

The  proof  is  in  Appendix  2.  There  is  a  similar  result  when  Y^^  is 
replaced  by  a  (cadlag:  right  continuous  with  left  hand  limits)  random  process. 


3.  A  Class  of  Data  Processors  {Estimators) 


For  the  ideal  filtering  problem  with  data  (2.2),  (2,4),  the  optimal  decision 

A 

functions  are  functions  of  the  estimates  z(  ),  I(  )  since  these  completely 
determine  the  conditional  distribution.  There  are  no  alternative  (admissible) 
functions  of  the  data  y(  )  which  are  better.  This  is  not  so  with  estimates 
based  on  I(  ),  z*(  )  for  the  system  z*(  ),  y*(  ).  We  now  define  a  large 
class  of  functions  of  the  observed  data  y*(  )  with  respect  to  which  functions 
of  z  (•),  Z(  )  arc  ’nearly  optimal’  for  small  €  >  0,  and  a  large  class  of  risk 

A  , 

or  cost  functions.  In  order  to  know  how  good  estimates  based  on  z  (■),  Z(  ) 
are  for  getting  information  on  z*(  ),  we  need  to  specify  both  a  class  of 

(observation  data  dependent)  alternative  estimators  -  as  well  as  a  criterion  of 

comparison;  i.c.,  a  cost  function.  We  work  with  only  one  particular  cost 

function  -  but  the  general  idea  and  the  natural  extensions  should  be  clear,  and 
the  method  works  with  ’typical’  cost  functions. 

Let  D  denote  the  class  of  measurable  functions  on  C[0,“),  the  space  of 
real  valued  continuous  functions  on  [0,“)  (with  the  topology  of  uniform 
convergence  on  bounded  intervals),  which  arc  continuous  w.p.l  relative  to 

Wiener  measure  (hence,  with  rcpcct  to  the  measure  of  y(  )).  Let  denote 

the  subclass  which  depends  only  on  the  function  values  up  to  time  t.  For 
arbitrary  F(  )  e  D  or  in  we  will  use  F(y^(  ))  as  an  alternative 

estimator  of  a  functional  of  z^(  ).  The  class  is  quite  large,  as  will  now  be 
seen. 

A 

First,  note  that  D  contains  all  continuous  functions  and  that  the  z(  ) 
of  (2.5)  can  be  written  as  a  continuous  function  of  the  integral  of  the  driving 


force  y(  ).  [To  see  the  latter  point,  solve  (2.5)  -  in  the  form  of  a  Wiener 
integral  and  do  an  integration  by  parts.]  Thus,  continuous  functions  of  z^(  ) 
are  admissible  estimators.  Many  important  functionals  are  only  continuous 
w.p.l  (relative  to  Wiener  or  y(  )  measure).  For  some  integer  n,  let  A  be 
a  Borel  set  in  R"*'  with  5A  having  zero  Lebesque  measure).  Then  [3],  the 
function  Iy^(x(tj),...,x(t J)  is  in  for  any  tj,...,t^  S  t,  where  x(  )  denotes 

the  canonical  function  in  QO,®).  Let  T(x(  ))  denote  the  first  time  that  a 
closed  set  A  with  a  piecewise  differential  boundary  is  reached  by  x(  ). 
Then  the  function  with  values  T  O  T(x(  ))  is  in  D.j.  for  any  T  <  “. 
Thus,  our  alternative  estimators  can  involve  stopping  times.  This  is  essential 
in  sequential  decision  problems  or  whenever  the  cost  or  risk  function  involves 
first  entrance  times  of  a  function  of  y(-)  into  a  decision  set. 

D  and  do  not  contain  ’wild’  functions  such  as  those  involving 

differentiation.  We  consider  D  and  as  a  class  of  data  processors.  It 

seems  to  contain  a  large  enough  class  for  practical  applications  when  the 
corrupting  noise  is  ’white’.  For  the  ’limit’  (white  observation  noise,  system 

z())  problem,  one  would  usually  want  processors  that  arc  continuous 
functions  (w.p.l)  of  the  data  y(  ).  Sec  the  comments  following  the  theorem 
statement  below. 

The  following  is  one  the  main  ’robustness’  or  ’approximation’  results.  For 
a  function  q(z),  we  write  (Pj,q)  for  the  integral  of  q(z)  with  respect  to 
the  Gaussian  distribution  with  mean  z  (t)  and  covariance  E(t)  -  the  ersatz 
conditional  measure  of  z^(  ).  We  let  the  q(  )  and  F(  )  below  be 
bounded,  but  the  theorem  holds  if  {(P^,q)^  q^(z*(t)),  F^(y^(  )}  is  uniformly 
integrablc. 


Theorem  3.1.  Assume  the  conditions  on  z^(  ),  Wy(  )  of  Section  2.  Then 
(z«(  ),  z^(  ),  \V^(  ))  =>  (z(  ),  z(  ),  Wy(  )).  Let  F(  )  e  be  bounded,  and  q(  ) 
bounded  continuous  and  real  valued.  Then  (the  limits  all  exist) 

(3-1)  lim  E[q(z‘(t))  -  F(yU  ))f 

S  hm  E[q(z"(t))  -  (Pf,q)]^ 


Remark.  The  theorem  states  that  (for  a  small  ()  the  ersatz  conditional 
distribution  is  ’nearly  optimal’  with  respect  to  a  specific  (but  broad)  class  of 
alternative  estimators.  The  alternative  class  includes  those  that  make  sense  to 
use  when  the  corrupting  noise  is  white.  If  the  noise  is  wide  band,  then  it 
might  not  make  sense  to  exploit  its  detailed  structure  and  use  other  ’better’ 
estimators.  Doing  so  might,  in  practical  cases,  cause  processing  errors  and 
other  (unmodelled)  noise  effects.  Wc  chose  the  estimate  of  the  conditional 
mean  at  t  in  (3.1)  for  illustrative  purposes.  Many  other  cost  or  risk 
functionals  could  be  considered;  e.g.,  integrals  of  estimation  errors  -  or  the  use 
of  the  estimates  for  control  purposes  (see  below).  The  comment  on  stopping 
times  in  the  paragraph  above  the  theorem  is  useful  for  sequential  estimation  - 
where  one  stops  when  some  function  of  the  data  first  hits  a  decision  set. 

The  assertion  concerning  the  weak  convergence  is  obvious,  but  necessary, 
since  we  need  to  know  that  the  limit  of  the  cited  €-triple  represents  a  true 
filtering  problem  -  with  all  three  components,  the  system  z(  ■ ),  the  filter  z(  ) 
and  the  observation  noise  (integral)  Wy(  ).  The  result  would  not  make  sense 


if  only  2  out  of  the  3  components  converged. 


4.  The  Large  Time  Problem  (Large  t.  srriall  e) 


The  filtering  system  often  operates  over  a  very  long  time  interval.  For 
the  model  (2.2),  (2.4),  one  would  then  use  the  stationary  filter  and  any 
acceptable  method  of  analysis  should  be  able  to  handle  this  Marge  time’ 
problem.  But  with  the  system  y^(  ),  z*(  ),  two  limits  are  involved  since  both 
t  -*  ®  and  £  -•  0,  and  it  is  important  that  the  results  not  depend  on  how 

t  -»  “>  and  €  -*  0,  and  that  the  use  of  the  stationary  limit  filter  is  justified. 
The  weak  convergene  method  is  well  set  up  to  handle  this  problem..  For 

convenience  we  make  some  additional  assumptions. 

C4.L  A.  is  stable.  (A,,H,)  is  observable  and  (A„BJ  controllable. 

By  (C4.1),  (2.6),  has  a  unique,  positive  definite  stable  limit  I.  The 
second  part  of  (C4.2)  is  unrestrictivc.  It  says  simply  that  increments  in  W^(  ) 
behave  ’close’  to  a  Wiener  process  for  small  £  -  no  matter  what  t  is. 

C4.2.  ?y(t)  takes  the  form  5y(t)  =  ?y(t/j2)/f,  where  ^y()  is  a  right 

continuous  second  order  stationary  process  with  integrable  covariance  function 
R(  ).  Also,  if  tg  -  ®  as  £  -  0,  then  W^(tg+  )  -  W«(tg)  \\^(  ). 

Remark.  The  model  (C4.2)  is  a  common  way  of  modelling  wide  bandwidth 
noise,  and  is  used  to  simplify  a  calculation  below,  and  to  avoid  the  details 

involved  with  other  models.  Note  in  particular  that  if  Sy(w)  is  the  spectral 
density  of  ^y(  ),  then  Sy(£M  is  the  spectral  density  of  5y(  ). 


The 


Sy(€*''')  converges  to  the  spectral  density  of  white  noise  as  €  -♦  0,  if  Sy(  ) 
is  continuous  at  w  =  0.  It  will  be  clear  from  the  proof  below  that  the 
condition  can  be  considerably  weakened.  We  also  make  the  rather  unrestrictive 
assumption  that  the  initial  time  is  not  important  and  that  the  z^(  )  processes 
do  not  explode: 

C4.3.  If  {z*(tg)}  converges  weakly  to  a  random  variable  z(0)  as  e  -*  0, 
then  z^(tj  +  ■)  ^  z(  )  with  initial  condition  z(0).  Also 

sup  Elz^ft)]^  < 

€,t  '  ' 

Consistency.  In  order  that  z(-),  I(  ),  be  a  filter  for  z(  ),  y(  ),  it  is 

A 

necessary  that  the  initial  conditions  be  consistent.  Let  N(z,Z;A)  denote  the 

A 

probability  that  the  normal  random  variable  (with  mean  z,  and  covariance  E) 

A 

takes  values  in  the  set  A.  By  consistency.,  we  mean  that  P{z(0)  e  A|z(0), 
E(0))  =  N(z(0),  E(0);A).  One  cannot  choose  the  initial  (random)  conditions 
arbitrarily.  It  should  be  obvious  that  if  E(0)  =  E  and  (z(0),  z(0))  are  the 
stationary  random  variables  for  (stable)  (2.2)  and  (2.5),  then  the  initial 

conditions  arc  consistent. 

The  question  of  consistency  arises  in  our  work  since  as  €  -*  0  and 

t  -•  we  do  not  know  a-priori  what  the  limits  of  (z^(t),  z®(t))  arc.  When  we 

study  the  asymptotics  as  t  -  ®  and  «  -  0,  we  will  start  the  filter  at  some 

large  t^  and  the  initial  condition  of  the  limit  equations  must  be  consistent 
for  the  problem  to  make  sense.  Fortunately,  they  will  be  consistent  -  so  we 
will  have  a  proper  filter.  This  problem  is  considerably  more  difficult  in  the 
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non-lincar  case.  Theorem  4.1  is  the  ’large  time’  extension  of  Theorem  3.1.  The 
question  of  consistency  is  either  ignored  in  filtering  -  or  else  implicitly 

A 

assumed;  e.g.,  one  cannot  allow  both  z(0)  =  1  w.p.l  and  that  z(0)  has  a 
normal  distribution  with  a  nonzero  covariance. 


Theorem  4.1.  Assume  the  conditions  of  Section  2  and  (C4.1)  -  (C4.3).  Let  q(  ) 
he  bounded  and  continuous  and  let  F(-)  €  Define  y^(s)  =  0,  for  s  ^  0 
and  define  y^  to  he  the  'reversed'  function  -  with  values 

(0  «  T  <  ")  y^(-“,t;T)  =  y^(t-T).  Then,  if  “  as  €  ■*  0, 

(4.1)  +  ■),  +  K  W^(tj  +  •)  -  W^(t,)} 

(z(-),  z(-),  Wy(-)) 

satisfying  (2.3),  (2.5),  and  z(  ),  z(-)  are  stationary.  Also  (3.1)  holds  in  the 
form 

(4.2)  lim  E  [q(z«(t))  -  F(y«(-“  t;  O)]^ 


J  lim  E[q(z^(t))  -  (P*,q)]^  . 

€.t 

A  *“ 

The  limit  of  (Pf,q)  is  the  expectation  with  respect  to  the  stationary  (z(  ),  E) 
system. 

Proof.  Suppose  that  €  >  0,  t  <  “)  is  tight  [i.c., 

sup  P{|z^(t)|  ?  N)  0].  Then,  by  the  hypothesis,  {z^(t),  z^(t),  €  >  0,  t  <  “>} 
is  tight  and  each  subsequence  of 


{z^(tg+  ),  z^(tg+  ),  W^(tg+  )  -  Wy(tj),  tg  <  “,  €  >  0)  has  a  weakly 

convergent  subsequence  with  limit  satisfying  (2.2),  (2.5).  Choose  a  weakly 

convergent  subsequence  (with  t^  -»  “)  also  indexed  by  £  and  with  limit 

denoted  by  z(  ),  z(  ),  Wy(-)-  Suppose,  for  the  moment,  that  z(  ),  z(  )  is 

stationary.  (Clearly,  I(t)  -•  E  as  t  -♦  “.)  If  all  limits  are  stationary,  then  the 
subsequence  is  irrelevant  since  the  stationary  solution  is  unique.  Also,  since 

A 

the  initial  conditions  of  z(  )  and  z(  )  are  consistent  under  stationarity, 

A  — 

(z(  ),  E)  is  the  optimal  filter  for  y(  ),  z(  ).  Inequality  (4.2)  is  a  consequence 
of  this  and  the  weak  convergence  (by  the  argument  used  in  Theorem  3.1.). 

We  next  prove  tightness  of  {z^(t),  £  >  0,  t  <  “},  and  then  the 

stationarity  will  be  proved.  Wc  have 

(4.3)  z«  =  [A^  -  Q(t)HJz«  +  Q(t)  Ut/£2)/e  +  Q(t)H^  z«(t)  . 

Let  <t’(t,T)  denote  the  fundamental  matrix  for  [A^  -  Q(i)H^  ].  There  are 

K  <  “,  \  >  0  such  that  jOlt,!)!  «  K  exp  -  X(t-T).  Wc  ha\c 

z'{t)  =  <I’(t,0)z^(t)+  <Kt,T)  Q(T)UT/j2)dT/£ 

•'o 

+  f  <I>(t,T)Q(T)Hz^(T)dT  . 

A  straightforward  calculation  using  (C4.2  -  C4.3)  and  the  change  of  variable 
T/£^  -  T  in  the  first  integral  yields 


E  |z^(t)|^  S  constant  (1  +  E|z^(0)|^)  . 


giving  the  desired  tightness. 

To  prove  the  stationarity  of  the  limit  of  any  weakly  convergent 

A 

subsequence,  we  need  only  show  stationarity  of  the  limit  values  (z(0),  z(0))  of 
the  initial  conditions  (z^(t^),  z^(tg)).  For  this,  we  use  a  ’shifting’  argument. 
Fix  T  >  0  and  take  a  weakly  convergent  subsequence  of  (indexed  also  by  e, 

t 

and  with  t^  -  “) 

{z«(t,+  .),  z«(t,+  ),  \V«(t,+  -)  -  z«(t,-T+  ),  z«(t, -T+  ). 

W^(t^-T+  )  -  W^(t,-T)} 

with  limit  {z(  ),  z(  ),  VVy(  ),  2.p(-),  Zjf-),  Wy  ,j.(  )},  Clearly,  z,j.(T)  =  z(0) 
and  z.j.(T)  =  z(0).  We  do  not  know  what  Zj(0)  or  Z'p(O)  are  -  but, 
uniformly  in  T,  they  belong  to  a  tight  set  (bounded  in  probability):  l.e.,  owing 
to  the  tightness  of  (z^(t),  z(t),  €  >  0,  t  <  “},  for  each  p  >  0,  there  is  an 
Np  <  “  such  that  P{|z.j.(0)|  +  |z,p(0)|  ?  Np}  <  p  for  all  T  and  limits  of 
conx'crgcnt  subsequences.  Write  (where  W^,p(  )  ’dri\'cs’  the  equation  for  dzj.) 

T 

z(0)  =  z.j.(T)  =  (exp  A^Tlz.j.(0)  +  J  exp  Aj^(T-T)  ■  B^^dW^  .j,(T) 

0 

z(0>  =  z^(T)  =  (c;<ip  [A^  -  Q(<»)HJT)z.j,(0) 

T 

+  J  exp  [A^  -  Q(“>)HJ(T-T)  (dW^,,^(Tl  +  H^z.j.(T)dT) 


m 
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Since  T  is  arbitrary  and  the  set  of  all  possible  {z^(0)}  is  tight,  the 
stability  of  and  (A^  -  Q(")H^)  implies  that  z(0)  is  the  stationary 

random  variable,  hence  z(  )  is  stationary.  Similarly,  'he  pair  (z(  ),  z(  ))  is 
stationary. 

Q.E.D. 

Remark.  There  is  no  analog  of  Theorem  4.1  if  A^  is  unstable  (if  t^  “ 
as  €  -•  0),  since  the  limit  of  z^(tg)  then  makes  no  sense.  If  z^(  ) 
satisfied  z^  =  A^z^  +  B^^(  )  for  appropriate  B  and  )  (such  that  the 

limit  of  z^(tj+  )  is  z(  )),  then  we  can  show  that 

lim  E[z^(t)  -  z^(t)]  [z^(t)  -  z^(t)]’  =  f  . 

€  ,t 


5.  The  Conditional  Gaussian  Problem. 


We  now  consider  the  ’wide  bandwidth’  observation  noise  analog  of  the 
conditional  Gaussian  problem  [12].  Let  djC),  i=I,2,  be  bounded  and 
continuously  differentiable  matrix  valued  functions  with  q2(x)q2(x)  5  a  I  for 
some  a  >  0.  The  signal  z*(  )  and  noise  5y(  )  satisfy  the  conditions  of 
Section  2,  but  the  observation  is  of  the  ’wide  bandwidth  and  conditional 
Gaussian  type’,  where  the  coefficients  arc  data  dependent; 

(5.1)  y*  =  qi(z*)z^  +  qj(z^)^^(t)  , 

where  5^(1)  =  ?y(t/€^)/€  also  satisfies  (C4.2)  and  (the  rather  unrestriclive) 
(C5.1)  below. 

C5.I  E|[  du[E[(y(u)  5y(s)'|  (y(v),  V  «  O]  -  R(u-s)]] 

^  8 

-  0 

as  s,t,  - 

Define  Rg  =  Jr®.  R(u)du,  Formerly  we  used  Rg  =  I. 

In  (5.1),  the  qj(  )  can  depend  on  the  covariance  E^(  )  given  by  (5.4) 
with  no  change  in  the  results.  The  qj(  )  can  also  be  more  general  functions 
of  y*(  )  -  as  will  be  clear  from  the  development.  For  simplicity,  we  use 

(5.1) .  (The  ’correction  terms’  are  more  complicated  in  the  general  case.  See 
remarks  below.)  Such  conditional  Guassian  systems  arise,  for  example,  when 
one  uses  the  observed  data  to  orient  or  focus  the  observing  mechanism,  and 


-22- 


the  signal  and  noise  strength  depend  on  the  orientation.  The  results  of  the 
previous  sections  are  no  longer  directly  applicable,  since  there  is  a  ’correction 
term’  due  to  the  ’non-independence’  of  ty(t)  and  its  coefficent  q2(z‘(t))  in 
(5.1)  -  and  similarly  for  related  terms  in  the  filter  (5.3). 

To  prepare  ourselves  for  setting  up  the  correct  filter  equations,  it  is 
useful  to  anticipate  the  ’correction  terms’  and  center  the  £-filter  appropriately 
so  that  the  limit  equations  are  the  desired  ones.  Define  the  vector 


a 


r 


I  i 


o  ■=  ...  ,  . 

E  qj  (z)(q2(z)Roq2  (z)]’*  q2(z) 

and  G  =  (Giv,G^^j^)  by  (recall  that  y(t)  €  R'‘  and  z(t)  €  R"'  and  let 
F;  ,  denote  the  derivative  of  F.  with  respect  to  z,) 


G.{iX) 


I  E  E  Fj;  (z,  E,  (y(t))Fj(z,  E,  (y(0))dt 
0  j  ■* 


Let  G^(z,0  (resp.,  G*(z,0)  denote  the  first  k,  (resp.,  the  last  m)  components 
of  G(z,0-  By  Appendix  3,  G(  )  is  the  proper  correction  term  for  the  (y*,z*) 
system,  if  z^  were  defined  by  the  appropriate  ’conditional  Gaussian’  form  of 
(2.5wb)- 

Define  the  centered  observation  and  filter 


GHz\  E«) 


(5.2) 
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(5.3)  -  G^z«) 

+  Qi'Cz*)  [q2(z^)Roq2'(z^)r^  [y^  -  qi(z^)z^] 

(5.4)  E^  =  E«  +  E^A^  + 

-  E^  qj(z^)  [q2(z^)RQq2(z^)]-*  q/z^)  E^  . 

(5.3)  and  (5.4)  will  be  the  proper  filter  for  z^(  ),  y^(-),  in  the  sense  that  the 
limit  is  the  usual  ’conditional  Gaussian’  filter  and  an  analog  of  Theorem  3.1 
or  4.1  can  be  proved.  Define  the  system 

(5.5)  dz  =  A^z  dt  +  B^dW^ 

(5.6)  dy  =  qi(z)z  dt  +  q2(z)dWy 

(5.7)  dz  =  z  dt  + 

E  qj(z)Ro[q2(z)Roq2(z)]-*(dy  -  qi(z)z  dt) 

(5.8)  E  =  A^E  +  EaJ  +  B^bJ  -  E  qj(z)  [q2(z)Rgq2(z)]‘^qj(z)E  . 

Note  that  (5.7,  5.8)  is  the  optimal  filter  for  (5.5,  5.6),  where  covWy(t)  =  iRq, 
and  W^(  )  and  Wy(  )  are  independent  and  covW^(t)  =  tl. 

Theorem  5.1  is  the  appropriate  analog  of  Theorem  3.1. 

Theorem  5.1.  Assume  the  conditions  of  Section  2,  (C4.2)  and  (C5.1).  Let  the 
system  (5.5)  -  (5.8)  have  a  unique  solution  (in  the  sense  of  distributions)  for  each 


initial  condition.  Then 


(5.9) 
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{z«(  ),  z«(  ),  W«(-),  y  K  HH  ))  ^ 

(z(  ),  z{  ),  Wy(.).  y(.),  !(.))  , 

where  z(  )  and  Wy(  )  Are  mutually  independent  and  cov  Wy(t)  =  tR^.  Also 
(3.1)  holds. 

Proof.  (3.1)  is  a  consequence  of  the  weak  convergence,  and  the  weak 
convergence  is  a  consequence  of  the  results  in  Appendix  3. 

Remark.  The  processes  in  (5.2)  and  (5.3)  were  centered  so  that  the  weak 
limits  (5.7),  (5.8)  would  be  the  correct  filter  for  the  limit  system  (5.5),  (5.6).  If 
we  had  not  centered,  then  the  limit  of  (the  uncentered)  y^(  )  equation  would 
contain  an  additional  drift  term  which  would  not  be  compensated  for  by  the 

correction  term  in  the  limit  of  the  uncentcred  z^(  )  sequence;  thus,  the  limit 

process  (z(  ),  E(  ))  would  not  necessarily  be  a  filter  for  the  (z(  ),  y(-)) 

process. 

Note  that  the  correction  (centering)  terms  involve  first  derivatives  of 
qj(  )  and  q2(  ),  (although,  via  the  centering,  the  limit  does  not  involve  the 

derivatives).  This  can  lead  to  some  unfortunate  and  generally  ignored 
difficulties.  Suppose  that  we  can  choose  the  qj(  )  and  that  we  choose  them 
to  optimize  some  cost  criterion.  We  can't  do  the  optimization  with  the 
(y*,z^,E^)  system  because  that  would  be  computationally  impossible  -  but  we 
can  (in  principle)  with  the  limit  system.  But,  unless  the  resulting  'control' 


(qi().  i=l,2)  is  continuously  differentiable,  it  cannot  be  used,  since  the 
correction  terms  involve  derivatives.  In  fact,  it  is  not  clear  whether  or  not 


there  is  a  weak  convergence  result  for  non-differentiable  djC).  Similar 
problems  arise  wherever  the  coefficient  of  a  ’wide  bandwidth’  noise  process 
depends  on  a  ’control’.  If  the  q^f  )  depended  on  the  y^(  )  or  y^(  )  in  a 
different  (but  ’smooth’)  way  -  other  than  via  (z^,E^),  there  will  usually  be  a 
(even  more  complicated)  correction  term.  But  its  form  can  be  worked  out  by 
the  methods  of  weak  convergence  theory. 


6.  Nonlinear  Observations 

The  ideas  of  the  previous  sections  (and  Section  8)  arc  useful  for 
problems  which  have  a  partly  non-linear  structure,  but  where  the  ’limit’  system 
is  linear.  We  now  develop  this  for  one  special  but  important  case.  Many 
filtering  or  communcation  systems  use  limiters  on  the  input  for  purposes  of 
increasing  robustness  or  for  ’linear  dynamic  range’  reasons,  when  the  power  of 
the  input  can  vary  over  a  large  range.  The  input  is  put  through  a  ’hard’ 
limiter  -  then  followed  by  a  linear  filter,  whose  purpose  is  to  reconstruct  the 
input.  Such  systems  have  been  of  great  interest  in  communication  theory.  See 
[13],  one  of  the  first  attempts  to  systematically  analyze  such  a  system. We  treat 
one  case  -  where  the  observation  is  scalar  valued  and  is 

(6.1)  y«  =  k(H^z«  +  5«(t))/€,  k(x)  =  sign(x),  y«(0)  =  0. 


The  l/€  is  a  normalizing  term  and  can  be  put  anywhere  in  the  filter  system 

-  as  long  as  the  system  is  linear.  The  normalization  might  or  might  not  be 

used  in  practice.  The  qualitative  results  will  remain  the  same  -  but  the 
average  power  in  the  unnormalized  observation  goes  to  zero  as  the  bandwidth 
of  ^y(  )  goes  to  A  similar  development  (with  the  same  results)  can  be 

carried  out  with  the  use  of  a  ’soft’  limiter;  i.c.,  k(x)  =  sign(x)  for 

|x|  >  c  >  0,  k(x)  =  x/c  for  |x|  <  c,  and  also  if  k(  )  is  vector  valued. 


We  use 


C6.1.  ^y(t)  =  5y(t/€^)/€,  where  4y()  is  a  cornponent  of  a  stationary 


Gauss-Markov  process  whose  correlation  function  goes  to  zero  as  t 


(hence 


0  exponentially). 


Write  E(J  (t))^  =  .  Then  the  average  of  6.1)  over  the  noise  ^y  is 


[;*)  H,  ^ 


where  -  0  as  €  -•  0,  uniformly  for  z*(t)  in  any  bounded  set.  In 

preparation  for  the  approximation  result,  define  the  systems 


dz  =  AjZ  dt  +  B^dWjj 


dy  =  fe]  H,2  dt  +  2J^  dWy 


dz  =  A  z  dt  +  Q(t)[dy  - 


E  =  A  E  +  Ea'  +  B  b'  -  Eh'h,  E  f— V] 

z  z  z  t  z  z  __2  J  L  ,1  J 


JlQ^  ^  ^  4J  > 


Jq  =  ^ J  sin'*  p(t)  dt  , 


where  p(  ■ )  is  the  correlation  function  of  5y(  ).  Define  z^(  )  by 


z«  =  A,  z^  +  Q(t)[y^  -  M  H,z^] 


<■  ^  1  ^  r  Jw-*  m.*  nJk  a  '  ^  \  ^  it  ^  •.  m 


Equations  (6.5)  to  (6.8)  represent  the  Kalman-Bucy  filter  for  the  system 
(6.3),  (6.4).  Equations  (6.6),  (6.9)  represent  the  filter  which  one  would  normally 
use  for  the  system  (z^(),  y^()),  and  whose  use  we  must  justify.  The 
justification  is  by  Theorem  6.1. 

Theorem  6.1.  Assume  the  conditions  of  Section  2  and  (C6.1).  Then 

(6.10)  (z^  ),  z'(  ),  y'(  )}  =>  (z(.),  z(-),  y(.))  . 

and  Wy(  )  is  independent  of  z(  ).  Also,  (3.1)  holds. 

Remark.  The  power  of  the  weak  convergence  methods  is  well  illustrated  by 
the  relative  ease  of  getting  this  result.  The  problem  is  very  hard  -  due  to  the 
nautre  of  the  nonlinearity,  and  alternative  approaches  to  even  a  small  part  of 
the  analysis  (e.g.,  as  in  the  classical  work  [13])  arc  very  involved. 

Proof.  The  proof  of  the  weak  convergence  follows  from  that  in  [14],  or  [1], 
Chapter  9.3],  and  (3.1)  follows  from  the  weak  convergence,  exactly  as  in  the 
proof  of  Theorem  3.1.  Actually,  the  proofs  in  [14],  [1]  use  a  signal  s(  ) 
which  docs  not  depend  on  «,  but  the  proofs  would  be  essentially  unchanged 
if  the  actual  €-dcpcndcnt  signal  z^(  )  were  used  instead. 


(7.1) 


K«(t)  =  ^  [E«k(H^z«(s)  +  5^(s))  -  Efk(H^7^(s)  +  (y^(s))]ds, 


where  e5  denote  the  expectation  conditioned  on  {5^(u),  z^(u),  u  «  t},  and 
Ej  denotes  the  expectation  conditioned  on  {z*(u),  u  s  t)  and  under  the 
assumption  that  ?y(s)  is  the  stationary  random  variable.  Let  ^^(t)  = 
Hg^yCt)  for  some  matrix  Hq,  where  4y(  )  is  the  Gauss-Markov  process  cited 
in  (C6.1).  Note  that  there  are  X  >  0  and  Cq  <  “  such  that 

(7.2)  I  variance  [stationary  5y(t)]  -  variance  [iy(t)|iy(0)  =  0]  |  «  c^exp  -  Xt  , 
|E[yt)|Ty(0)]  I  «  (c^exp  -  Xt)|Ty(0)|  . 

Changing  scale  s/€^  -•  s  in  (7.1)  and  multiplying  the  arguments  of  k(  ) 
by  €  yields 

(7.3)  K«(t)  =  €  j”  [Ef  k(€H^z^(€2s)  +  ys))  -  Ej^  k(eH^z^e^s)  +  ^(s))]ds. 


For  large  initial  conditions  (at  time  t/e*)  5y(t/€^),  (7.1)  might  be  large.  For 

|iy(t/«^)|  ^  1  and  s  -  t/€^  ?  0(log  |Ty(t/«^)|),  the  conditional  mean  of 
iy(s)  (given  5y(t/€^))  will  be  0(1).  Thus,  we  can  write 


lK<(t)|  -  0«)[|Vt/<’)l  *  l]  * 


Let  N(a,b)  denote  a 


We  now  deal  with  initial  values  =  0(1). 

normalh  distributed  random  variable  with  mean  a  and  variance  b.  In 
evaluating  the  expression 

Ef  kleH^z^Ce^s)  +  ^(s))  -  if  k(£H^z^(£^s)  +  ^^(s))  ,e^s  ?  t, 

we  can  replace  the  conditional  expectations  by  expectations  over  only, 

where  the  first  ^(s)  can  be  taken  to  be  N(6j,  Oq  (l-Sj))  and  the  second 
can  be  taken  to  be  N(0,Oq),  where  B.  -*  0  exponentially  as  (s  -  t/e*)  -*  ® 
For  notational  simplicity,  set  Og  =  1. 

For  small  6.  >  0  and  z  >  0  (with  a  similar  development  for  z  <  0), 

|P{  lN(6i,  l-Sj)!  «  €z}  -  P{1N(0,1)1  «  €z)l 

«  |P{1N(0.1-62)|  «  €Z}  -  P{|N(0,1)|  S  €Z}|  +  0(Bj) 

«  |P{|N(0,1)|  S  £2(1+252,)}  -  P{|N(0,1)|  «  £Z}|  +  0(6j) 

«  2P{£z  «  N(0,1)  «  £z(1+252))  +  0(6j)  =  0(£z62)  +  0(6j)  . 

Putting  these  estimates  into  (7.3)  and  using  the  cited  fact  that  6j  -•  0 
exponentially,  for  some  X  >  0  we  have 

lK«(t)|  S  0(£)[|yt/£2)|  +  l] 

CO  _ 

+  0(£^)  J  Ef|z^(£^s)|  exp  -  X(s-t/£^)  ■  ds 


(7.4) 


Part  2.  Write  (6.9)  as 


(7.6)  =  [a^  -  Q(t)H^ 

ROq 

where  we  use 

k^t)  =  k(H^z«(t)  +.  (^(t)). 


For  our  stability  argument,  Q(®)  can  be  used  in  lieu  of  Q(t)  in  (7.6) 

(justified  by  a  ’perturbation  argument,  which  we  omit).  Define 

A,  -  [A.  -  . 

and  let  P  >  0  be  such  that  AjP  +  PAj  =  -C  <  0.  We  start  with  the 
Liapunov  function  z'Pz  =  V(z),  and  then  ’perturb’  it.  Sec  Appendix  4  for 
the  definition  of  the  operator  A^  below.  (It  is  essentially  a  ’differentiation’ 
operator.)  By  Appendix  4,  we  have 

(7.7)  A«  V(^(t))  =  V(z«(t)) 

=  -^^'(t)  C  +  2z«(t)’pQ(“>)k^(t)/«. 

The  second  term  on  the  right  of  (7.7)  is  not  dominated  by  the  first,  and 
we  ’perturb’  the  Liapunov  function  in  order  to  ’control’  the  bad  term.  Define 


the  perturbation 


(t)  =  2z«'(t)P  [  Q(“)[Ef  k«(s)  -  Ef  k^(s)'lds  . 

By  Part  1,  |Vj(t)|  =  0(  1 )  j  z^(t)  j  Kj  (t)  and  by  Schwarz’s  inequality  and  (7.4), 

(7.8)  El^id))  =  0(0  E^|$«(t)|2 

=  0(£)  [l  +  E|z"(t)|2]  . 

Also,  we  can  readily  show  that 

(7.9)  AW«(t)  =  -  2z^(t)  PQ(“)  [k^(t)  -  E^k^(t)] 

€ 

+  2z^'(t)P  f”Q(“)[Efk^(s)  -  Efk^(s)]ds  . 

£  ‘ 

Recall  that  the  Ejk^(s)  is  the  expectation  over  the  stationary  ^(s)  only; 
i.e.,  the  conditioning  data  is  just  z^(  ).  It  can  be  shown  that 

E|^(t)|  |Efk^t)|  =  0(l)E^|z«(t)|2E*^|z^(t)|2  . 

€ 

By  substituting  (6.9;  for  z  and  using  our  bound  on  the  integral,  the  last 

term  on  the  right  of  (7.9)  is  bounded  by 

0(1)  [|^St)|  +  l]K^(t)  +  0(1)  K^(t)/e  . 

Define  the  perturbed  Liapunov  function  V^(t)  =  V(z^(t))  +  V^(t). 

A  -  - 

Putting  the  estimates  together,  evaluating  A  V‘(t)  via  (7.7)  and  (7.9)  and 


cancelling  the  common  1/e  terms  (with  opposite  signs)  in  (7.7)  and  (7.9)  and 
using  (See  (A4.1)  in  Appendix  4) 


EV«(t)  =  EV^(O)  +  r‘EA^V^(s)ds 

•Ct 


and  the  bound  on  E|V^(t)|  yields 


(7.10) 


EV(z^(t))  «  eonstant  +  0(6)  [l  +  E|z^(t)|^] 

+  /^(constant)  ds  -  Ez^  (s)Cz^(s)ds 

+  (eonstant)  j‘  E(jz«(s))  +  l)Kj^  (s)/6  ds 
+  (constant)  jJ  E^jz^(s)|2E^jz^(s)|^ds  . 


Using  the  inequality  |ab|  S  a^/c  +  cb^  for  any  c  >  0  and  (7.5)  in  the 

last  two  integrals  of  (7.10)  yields,  for  some  constants  c^  >  0  and  c  >  0, 


(7.11) 


E|z«(t)|2  $  Cjd+t)  -  Co  E|z^(s)[2ds 
^  ^  +  C  C3  t  . 


By  letting  c^/c  <  c„,  (7. 1  I )  implies  that 


(7.12) 


sup  Ejz^(t)|''  < 


Finally,  (7.12)  is  equivalent  to  the  required  tightness  of  {z^(t),  e  >  0,  t  <  ®). 
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(8.3) 


■vf 

=  v€  - 

■h,z  ' 

—  y  - 

+  e  R«  ,  y^(0)  =  0  , 


where  the  three  processes  Jq  5^(s)ds  =  W^(t),  Jq  5^(s)ds  =  W^(t)  and  z^(  ) 
are  mutually  independent,  and  W^(  )  ^  W(  ),  W^(  )  ^  standard  Wiener 

processes.  Thus  and  5^()  are  wide  bandwidth  noise  processes. 

Correlations  among  these  processes  can  also  be  handled,  at  the  expense  of  a 
more  complex  notation. 

Define  the  filters  and  limit  system: 


(8.4) 


DxU 


+  Q(t) 


Hxx'^ 


(8.5) 


dv  = 


HxX 


^x  ' 


dt  +  dW  =  H 


+  dw 


(8.6) 


'A  ' 

X 

A 

_ 

^  A  ^ 

Ax^ 

A 

dt  + 

'd,u' 

Q(t)  [dy  - 

Kr 

1  A 

“■] 

z 

•  • 

K.  *>  J 

0 

L 

j 

(8.7) 


dx  =  A^xdt  +  DjjUdt  +  B^^dW^  , 


with  the  obvious  associated  Ricatti  equation  for  the  conditional  covariance 
Z(  )  of  (x(  ).  z(  )).  Here  Q(t)  =  I(t)H'[cov  W(l)]‘^.  Equation  (8.4)  will  be 
the  filter  for  (x^(  ),  z^(  ))  with  data  y^(  ),  and  (8.6)  is  the  filter  for  (8.5), 
(8.7).  The  cost  functions  for  the  control  problem  arc 


R^(u)  =  Te  r(x«(tX  z«(t),  u(t))dt, 

•  n 


R(u)  =  J^E  r(x(t),  z(t),  u(t))dt, 

•  n 


for  bounded  and  continuous  r(-,-,-X  and  some  T  <  “ 

The  controls  take  values  in  a  compact  set  U,  and  we  let  (see  related 

definition  of  D  and  in  Section  3)  Jf  denote  the  set  of  U-valued 

measurable  (ijJ,t)  functions  on  C®[0,“’)  x  [0,“)  which  are  continuous  w.p.l. 

relative  to  Wiener  measure.  Let  denote  the  subclass  which  depends  only 

on  the  function  values  up  to  time  t.  We  view  functions  in  K  as  the  data 
dependent  controls  with  value  u(y(  ),t)  at  time  t  and  data  y(  ).  Let  If 

denote  the  subclass  of  functions  u(-,  )  €  K  such  that  u(-,t)  e  Jf^  for  all  t 
and  with  the  use  of  control  u(y^(  ),  )  (resp.,  u(y{  •),  )),  (8.2)  and  (8.4)  (resp., 
(8.6),  (8.7))  has  a  unique  solution  in  the  sense  of  distibutions.  These 
u(y*(-)>  )  and  u(y(  ), •)  are  the  admissible  controls. 

Commonly,  one  tries  to  use  the  model  (8.5),  to  (8.7)  to  get  a  (nearly) 
optimal  control  for  cost  (8.9).  This  control  would,  in  practice,  actually  be 
applied  to  the  ’physical’  system  (8.2),  (8.4),  with  actual  cost  function  (8.9).  Such 
controls  would  normally  not  be  ’nearly’  optimal  in  any  strict  sense  for  the 
physical  systems  and  questions  arise  which  are  similar  to  those  posed  for  the 
pure  filtering  problem:  in  particular,  with  respect  to  what  class  of  comparison 
controls  is  such  a  control  ’nearly  optimal’.  Again,  weak  convergence  theory  can 
provide  some  answers,  although  the  problem  is  considerably  more  difficult,  and 
the  results  less  satisfactory. 


Lv.v.*- 


Straightforward  weak  convergence  arguments  (using  only  the  assumed 
weak  convergence  of  the  ’driving  W^(-),  W^(  )  processes’,  and  the  uniqueness 
of  the  limit  can  be  used  to  prove  Theorem  8.1.  Let  M  denote  the  class  of 
U-valued  continuous  functions  u(-,-,-)  such  that  with  use  of  control  with 
value  u(x(t),  z(t),t)  at  time  t,  (8.6),  (8.7),  has  a  unique  (weak  sense)  solution. 
Let  Mq  denote  the  subclass  of  controls  (stationary  controls)  which  do  not 
depend  on  t  (for  use  in  the  next  section).  Let  u(y^,  ),  u®(x^,z^,  )  and 
u®(x,z,)  denote  the  controls  with  values  u(y^(),t)  u^(x^(t),z^(t),t)  and 
u^(x(t),z(t),t)  at  time  t. 


Theorem  8.1.  Assume  the  conditions  above  in  this  section.  For  6  >  0,  let  there 
exist  a  control  u^(-,  )  in  M  which  is  ^-optimal  for  (8.6),  (8.7),  (8.9),  with 
respect  to  controls  in  If.  Then,  for  any  u(  • ,  • )  6  H  ’ 


(8.10) 


R^u(v^  ))  ?  lim  R^(u®(x^  z«,  •))  -  S 
€ 


=  R(u‘’(x,  z,  •))  -  6 


Remark.  It  would  be  preferable  if  wc  could  allow  the  comparison  control 
u(y^(  ),  )  to  depend  on  €  other  than  only  via  the  values  of  y^(  );  i.e.,  for 
it  to  be  a  (say)  6-optimal  admissible  control  for  the  'physical'  e-system.  This  is 
possible,  if  we  can  a-priori  guarantee  some  smoothness  (uniformly  in  c)  of  the 
obtained  controls  -  so  that  a  weak  convergence  argument  can  be  carried  out  - 
yielding  an  admissible  limit  control  for  the  f iltcring/control  problem.  But,  in 


general,  the  limit  of  {u*(y*(  • ),  • )}  would  not  necessarily  be  dependent  only  on 
the  limit  data  y  -  even  if  y^(  )  ^  y(  ).  This  is  clear  from  the  examples  in 
Section  2.  Similar  difficulties  occur  in  all  work  concerning  the  existence  of 
optimal  controls  under  ’partial  information’. 

Extensions.  The  theorem  can  be  carried  over  to  the  ease  where  the 

observations  (of  both  and  z^)  are  of  the  non-linear  form  (6.1),  and  to 

the  conditional  Gaussian  ease. 

Theorem  8.1  can  be  readily  extended  to  the  non-linear  case  where 

X*  =  b(x*,u)  +  o(x,5^)  and  (x^(-)}  converges  weakly  to  an  appropriate 

diffusion  for  ’nice’  controls,  and  where  x^(t)  can  be  observed  without 
additive  noise.  If  the  noise  term  o(x,?^)  were  of  the  control  dependent  form 
<^(x,u,5^),  then  there  might  not  be  a  weak  convergence  result  -  unless  u(  ) 
were  ’smooth’.  In  the  ’smooth’  case,  there  might  be  a  correction  term  which 

depended  on  certain  derivatives  of  the  control!  See  Section  5  for  additional 


comment  on  this  point. 
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9.  Filtering  and  Control:  The  Large  Time  Case. 

We  now  treat  the  filtering  and  control  analog  of  the  large  time  and 
bandwidth  problem  of  Section  4,  and  will  use  the  assumptions 


o 

< 

B  ■ 

C9.1. 

X 

=  A  is  stable,  fA;H^,Hy]  is  observable  and 

X 

A, 

.  0  A.  . 

L  bJ 

controllable. 

C9.2.  )  satisfies  (C4.2). 

The  cost  functions  are 

(9.1)  7^(u)  =  lim  if  E  r(z^(t),x^(t),u(t))dt 

(9.2)  7(u)  =  lim  f  f  r(z(t),x^(t),u(t))dt 

We  adapt  the  point  of  view  of  [18,  Section  6]  and  assume  that  the 
system  can  be  Markovianized.  This  is  incorporated  in  the  following 

assumption. 

C9.3.  For  each  e  >  0,  there  is  a  random  process  )  such  that 

Ci/i^(t),  t  <  “)  is  tight  and  for  each  u(  )  e  (Mq  defined  above  Theorem  8.1) 
X^(  ■)  =  {x^(  •),z^(  ),x^(  •),z^(  •),5*(  •)}  is  a  right  continuous 


homogeneous  Markov- Feller  process  (with  left  hand  limits). 


Remark.  If  z®(  )  satisfies  2*  =  +  5^,  then  the  assumption  (C9.3) 

holds  if  the  driving  noises  ))  satisfy  (C9.3)  and  (C9.1),  (C9.2) 

hold;  i.e.,  if  the  noises  ?j(  )  and  5^(-)  can  be  written  as  functions  of  a 
suitable  Markov  process.  Let  u(x^,z^)  and  u(x,z)  (and  similarly  for  u®) 
denote  controls  with  values  u(x^(t),z^(t))  and  u(x(t),z(t))  at  time  t. 

Theorem  9.1.  Assume  the  conditions  of  Theorem  8.1  and  (C9.1)  -  (C9.3).  Let 
^^(•)  and  5x(  )  satisfy  (C4.2)  and  let  z^(  )  satisfy  (C4.3).  For  B  >  0, 
let  there  be  a  h-optimal*  control  u^(-,  )  €  for  the  system  (8.1),  (8.6),  (8.7), 
and  cost  (9.2),  and  for  which  (8.1),  (8.6),  (8.7)  has  a  unique  invariant  measure. 
Then,  for  u(  •,  •)  € 

(9.3)  —  7"(u(x^z«))  >  Iim7"(uV,z'))-  8 

'  € 

=  7  (u^x,z))  -  8  . 

Proof.  Fix  u(-,  )  €  Mp.  Define  the  ’averaged  transition  measure’ 

Pt(  )  =  fpJo  P{X'(0  e  •  ix^(0)}dt, 

where  the  expectation  E  is  over  the  possibly  random  initial  conditions,  and 
X*(  )  is  the  process  corresponding  to  the  use  of  u(x^(  ),  z^(  )).  By  the 

*By  8-optimal,  one  means  that  it  is  6-optimal  with  respect  to  all 

non-anticipative  (with  respect  to  the  observed  data)  measurable  u-valucd 
controls,  for  each  initial  condition. 


> 
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hypothcsis,  {P|-(  ),T  5  0}  is  tight.  Also  (writing  X  =  (x,z,x,z)) 


7^(u(x^,z^))  =  Imi  Jr(x,z,u(x,z))  Pj(dX). 


Let  -•  ®  be  a  sequence  such  that  it  attains  the  limit  lim,  and  for  which 
P^€(  )  converges  weakly  to  a  measure,  which  we  denote  by  P'(-)-  Using 
the  ’Feller’  property  and  the  right  continuity,  it  is  not  hard  to  show  that 
P^(  )  is  an  invariant  measure  for  X^(-).  Also,  by  construction  of  P^(  ), 


7^(u(x^z^))  =  j7(x,z,u(x,z))P^(dX)  . 


Let  (Xq(  •),Zq(  •),Xq(  •),Zq(  •))  denote  the  first  four  components  of  the 
stationary  Markov-Feller  X^{-)-process  associated  with  the  invariant  measure 
P^(  ).  By  our  hypotheses  (see  the  argument  in  Section  4)  {Xq(  •),Zq(  •),Xq(  •), 
Zq(  )}  converges  weakly  to  a  limit  (Xq(  •  ),Zp(  •  ),Xo(  •),$()(  ■))  satisfying  (8.7), 
(8.1),  (8.6).  Also,  the  limit  must  be  stationary,  since  the  ■))  is  for 

each  £.  Let  )  denote  the  invariant  measure  associated  with  this  stationary 


limit.  Then 


7^(u(x^,z^))  -•  7(u(x,z))  =  Jr(x,z,u(x,z))  ^“(dxdzdxdz)  . 


By  a  similar  argument,  it  can  be  shown  that 


7(u^(x,z))  =  Jr(x,z,u^(x,z))  4“  (dxdzdxdz) 


=  lim  7^(u®(x*,z^))  . 


fgh 


(The  uniqueness  of  the  invariant  measure  u"  (■)  is  used  here).  Inequality 


(9.3)  now  follows  from  the  6-optimality  of  u®(  ).  Q.E.D. 

Extensions.  As  for  the  case  of  Theorem  8.1,  we  do  not  know  how  to  work 
with  arbitrary  admissible  u*(  )  as  comparison  controls.  But  Theorem  9.1  can 
be  extended  in  many  ways.  Perhaps  the  simplest  is  the  following.  For 

arbitrary  q  and  tj  >  0,  let  u(t)  depend  on  (y(t-tj)  -  y(t),  i  S  q)  or  (for 

the  e-system)  on  (y^(t-tj)  -  y*(t),  i  <  q)  as  appropriate,  as  well  as  on 
x(t),z(t),  or  on  x^(t),z^(t),  and  enlarge  the  class  to  include  such 

dependencies.  Then  the  theorem  remains  true.  More  generally,  we  can  allow 
u(-)  to  depend  on  other  functionals  of  the  data,  provided  that  those 

functionals,  together  with  X®(  )  can  be  ’appropriately  Markovianized’  -  so 
that  the  scheme  of  the  proof  can  be  used,  and  the  uniqueness  and 

non-anticipitative  properties  continue  to  hold. 


Appendix  1,  Weak  Convergence  Definitions. 

Let  Pn(  )  be  the  measure  associated  with  a  Euclidean  r-space  (RO 

valued  random  variable.  We  say  that  {X,^}  or  {PJ  is  tight  (equivalently, 
{X^}  is  bounded  in  probability)  if  sup  Pn{|Xj^|  ^  N)  -*  0  as  N  If 

n 

{Xj^}  is  tight,  then  by  the  Helley-Bray  Theorem,  there  is  a  subsequence  '{nj} 

and  a  measure  P(  )  and  associated  random  variable  X  such  that  X^^  -*  X 

"i 

in  distribution.  Equivalently,  Ef(Xjj)  -*  Ef(X)  for  each  bounded  and 

continuous  function.  In  fact,  f(  )  can  be  any  bounded  measurable  function 
for  which  P{x:  f(  )  discontinuous  at  x)  =  0  ([3],  Theorem  5.1).  As  seen  in 

the  text,  this  is  a  useful  generalization. 

Let  C[0,")  denote  the  space  of  continuous  functions  on  [0,“)  with 

values  in  E''  (we  always  omit  the  r-dependence  in  the  notation),  with  the 
topology  of  uniform  convergence  on  each  bounded  interval.  The  metric  on 
C[0,®)  can  be  taken  to  be 

d(x(  ■),>'(  •))  =  f  e‘‘  max  [1,  sup  |x(s)  -  y(s)|]dt 

Let  D[0,®)  denote  the  space  of  (R''-valued)  functions  on  [0,“)  which 

are  right  continuous  and  have  left  hand  limits,  and  with  the  Skorohod 

topology.  See  [3],  [4]  for  a  discussion  of  this  topology.  The  topology  can  be 
metrized  so  that  the  spee  is  complete  and  separable.  If  x(  )  is  continuous, 
then  X|^(  )  -»  x(  )  in  this  topology  if  and  only  if  the  convergence  is  uniform  on 

each  bounded  interval.  This  is  all  that  we  need  to  know  here.  If  d.j.(-,  )  is 

« 

the  metric  on  D[0,T],  then  (as  above),  the  metric  on  D[0,“)  can  be  taken  to 


be  Jq  e"‘dj(x(  ),y(  ))dt.  The  spaces  C[0,“>)  and  D[0,“')  are  the  two  most 

useful  (currently)  spaces  for  the  study  of  the  convergence  of  a  sequence  of 
random  processer.  Even  if  the  paths  are  continuous,  it  is  often  more 
convenient  to  work  with  D[0,“). 

Let  Pn(  )  be  a  measure  on  D[0,“)  associated  with  a  random  process 
x„(  )  (which  we  call  X^)  whose  paths  are  in  D[0,“)  w.p.l.  We  say  that 
P„(  )  converges  weakly  (written  =^)  to  a  measure  P(  )  associated  with  a 

process  X  =  x(  )  with  paths  in  DfO,®)  if  Ef(Xj,)  Ef(X)  for  each 
bounded  continuous  function  f(  )  on  D[0,*).  We  might  also  write  X^^  =>  X. 
If  there  is  weak  convergence,  then  f(  )  can  be  any  measurable  function 
which  is  continuous  only  almost  everywhere  with  respect  to  the  limit  measure 
P(  )  [3,  Theorem  5.1].  The  sequence  (X^^)  or  {P„)  is  said  to  be  tight  if 

for  each  8  >  0,  there  is  a  compact  set  Kg  €  D[0,“)  such  that 

P„{Xjj  €  Kg)  ?  1-6  for  all  n.  If  {XJ  is  tight,  then  there  is  a  subsequence 

and  a  P(  )  on  D[0,”)  (with  associated  process  X  =  x(  ))  such  that 
Xj^  ^  X.  Analogous  definitions  and  facts  hold  for  processes  with  paths  in 
C[0,»). 

There  are  many  useful  criteria  for  tightness  and  for  identifying  the 

limits.  For  purposes  of  analysis,  it  is  often  useful  to  alter  the  probability 
space  so  that  there  is  a  stronger  type  of  convergence.  The  choice  of  the 

probability  space  docs  not  affect  the  weak  convergence  result  -  since  the 

disributions  of  the  X^^  never  changes. 

Skorohod  imbedding  (sometimes  called  Skorohod  representation)  [1],  [19], 
Let  Pn  ^  E  D[0i®)  (or  on  C[0,*)).  There  is  a  probability  space  (n,B,P) 

with  processes  Xjj,X  defined  on  it  so  that  PiX^^  €  A)  =  P{X^  e  A), 


P{X  €  A}  =  P{X  €  A)  for  any  Borel  set  A  e  D[0,®)  {or  in  C[0,"),  if  we  are 
working  in  this  space)  and  d(X^,X)  0  w.p. J.  Thus,  if  we  wish,  we  can 
alter  the  probability  space  so  that  we  get  w.p.l.  convergence  in  the  metric  of 
D[0,“)  (or  C[0,”)),  without  altering  the  distributions  of  each  process  X^^  or 
X.  This  device  often  facilitates  the  analysis. 


Appendix  2.  Proof  of  Lemma  2.1. 


Proof.  Choose  a  finite  partition  G  =  (Gq,Gj,...)  of  R®  such  that 

(A2.1)  P{Y  €  dGJ  =  0,  all  i;  P{Y  €  G, }  >  0,  i  >  0, 

P{Y  €  Gq}  =  0  . 

(For  notational  simplicity  we  omit  Gq  below.)  Let  F  (resp.  F^)  denote  the 
o-algebra  on  O  induced  by  {I„(Y),  i  >  0)}.  (resp.,  {Ig(Y  ),  i  >  0)).  Given 

I  i 

5  >  0,  we  can  choose  the  partition  such  that 

(A2.2)  E[E(XjY)  -  E(X|F)]2  «  s  . 

By  Jensen’s  inequality, 

(A2.3)  E(E2(X„jFJ]  «  E[E2(X„|Y„)]  . 

By  (A2.3)  and  the  arbitrariness  of  6,  to  prove  the  lemma,  we  need  only  show 
that 

iim  E[X„  -  E(X„|F„)f 

=  ii^  [EXl  -  EE2(X„|F„)] 

n 

S  E[X  -  E(X|F)]='  =  EX^  -  E[E2(X|F)], 
or,  equivalently,  (since  EX^  -  EX^)  th^t 


iJJiJ  n  A 

(A2.4)  —  E  E\X^\TJ  ?  E  E^CXjF)  , 

(A2.4)  follows  from  Bayes’  rule,  the  weak  convergence  and  Fatous’  lemma 
since 

lim  , 

—  E  E2(X„jr J  = 

lim  »  E^„Iv(G.) 

-  E  - - 

"  1  P(Y„eG.) 

„  E2xiY(Gi) 

J  E  -  =  E  e2(XIF)  . 

1  P(YeGi) 


[We  used  the  fact  (sec  Appendix  1)  that  if  P{Y€3Gj)  =  0,  then 
XJy  (Gj)  XlY(Gi). 


Appendix  3.  A  Method  for  Getting  Weak  Convergence. 


In  this  section,  we  outline  a  method  for  showing  that  a  sequence  of 
solutions  to  a  wide  bandwidth  noise  driven  ODE  converg's  weakly  to  a 
diffusion,  and  identify  the  diffusion.  The  method  is  taken  from  [1,  Chapter 
5],  and  is  a  slight  simplification  of  the  method  in  [8]. 

Let  x^(  )  be  defined  by 

(A3.1)  =  K(x^)  +  F(x«)i(t/€V£, 

where  4(  )  is  a  second  order  stationary  right  continuous  process  with  left 
hand  limits  and  intcgrable  correlation  function  R(  ),  and  the  functions  K(  ) 
and  F(  )  are  continuous,  (A3.1)  has  a  unique  solution  and  F(  )  is 
continuously  differentiable.  Define  Rq  =  J .  oo  E5(u)5  (0)du,  assume  that 

(A3.2)  E|J‘  du  [E(^u)^'(s)|Ut),TSO)  -  Ru-s)]  |  -  0 

as  t,s  - 


The  condition  is  not  very  restrictive.  We  use  it  here  only  because  it  allows 
the  use  of  a  convenient  reference. 

Define  the  diffusion  operator  £  and  function  G  =  (Gj,...)  by 


(A3.3)  :Cf(x)  =  f^(x)  K(x)  +  j”  E[f^(x)F(x)Ut)]'  F(x)?(0)dt 


=  If,((x)  G.{\)  +  ^  trace  (f^.^  fx)}  ■  {F(x)RoF '( x)}. 


where  (Gj, •  ■  ■)  =  G  are  the  coefficients  of  the  first  derivatives 
(A3.3). 

The  operator  £  is  the  differential  generator  of  the  Ito  process 

- 

(A3. 4)  dx  =  G(x)dt  +  F(x)Rq  dw, 

where  w(  )  is  a  standard  Wiener  process.  Suppose  that  (A3. 5)  has  a  unique 
solution  in  the  sense  of  distributions.  Then,  by  [1.  Chapter  5.8.4],  if 
x^(0)  x(0),  then  x^(  )  ^  x(  ),  with  initial  condition  x{0). 
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A  - 

Appendix  4.  A  Weak  Infinitesimal  Operator  A^. 


Refer  to  the  notation  of  Section  6.  Let  f(  ),  g(  )  be  real  valued 

(progressively)  measurable  functions  of  z^(  ),  £^(  )  and  let  Ef  denote  the 

y  ^ 

expectation  conditioned  on  2^(s),(y(s),s  S  t.  Define  the  operator  A^  by: 
f(  )  e  domain  of  and  A^f  =  g  if  for  each  T, 


f(t+A)  -  f(t). 

lim  sup  E  — ‘ -  < 

A-o  €,t%T  '  A  ‘ 


sup  EjgCt)]  < 


Ef  f(t+A)  -  f(t)  A 

£|_1 - - - .  ^  0^  gagj,  ^ 


Then  (1],  [8],  [16],  [17],  for  s  ?  0,  t  ?  0, 


(A4.1) 


E^^  f(t+s)  -  f(t)  =  Jp  Ef  A®  f(u)du. 


The  A®  operator  plays  the  role  of  an  infinitesimal  operator  for  non-Markov 
processes.  The  relationship  (A4.1)  has  many  applications  (see  the  references)  in 
weak  convergence  theory. 
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